This article discusses the limitations of Large Language Models (LLMs) in classification tasks, focusing on their lack of uncertainty and the need for more accurate performance metrics. New benchmarks and a metric named OMNIACCURACY have been introduced to assess LLMs' capabilities in both scenarios with and without correct labels.